BTS 510 Lab 10

set.seed(12345)
library(tidyverse)
Warning: package 'purrr' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Stat2Data)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Run models with interaction effects to assess conditional effects of predictors
  • Probe interaction models to fully understand all effects

2 Data

  • ICU data from the Stat2Data package: n = 200
    • ID: Patient ID code
    • Survive: 1 = patient survived to discharge or 0 = patient died
    • Age: Age (in years)
    • AgeGroup: 1 = young (under 50), 2 = middle (50-69), 3 = old (70+)
    • Sex: 1 = female or 0 = male
    • Infection: 1 = infection suspected or 0 = no infection
    • SysBP: Systolic blood pressure (in mm of Hg)
    • Pulse: Heart rate (beats per minute)
    • Emergency: 1 = emergency admission or 0 = elective admission

3 Analysis

  • Replicate and extend the analysis from the lecture
    • Age, Emergency, and their interaction predict Pulse
    • Be sure to mean center Age for interpretability.

4 Tasks

  • Model 1: Pulse ~ Age + Emergency
  • Model 2: Pulse ~ Age + Emergency + Age*Emergency
library(Stat2Data)
data(ICU)
ICU <- ICU %>% mutate(AgeC = Age - mean(Age, na.rm = TRUE))
m1 <- lm(data = ICU, Pulse ~ AgeC + Emergency)
m2 <- lm(data = ICU, Pulse ~ AgeC + Emergency + AgeC*Emergency)
summary(m1)

Call:
lm(formula = Pulse ~ AgeC + Emergency, data = ICU)

Residuals:
   Min     1Q Median     3Q    Max 
-63.10 -18.13  -3.56  17.27  88.73 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 90.50761    3.68515  24.560  < 2e-16 ***
AgeC         0.09723    0.09527   1.021  0.30873    
Emergency   11.45223    4.31849   2.652  0.00866 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 26.48 on 197 degrees of freedom
Multiple R-squared:  0.03582,   Adjusted R-squared:  0.02603 
F-statistic: 3.659 on 2 and 197 DF,  p-value: 0.02753
summary(m2)

Call:
lm(formula = Pulse ~ AgeC + Emergency + AgeC * Emergency, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-63.557 -16.917  -3.928  16.011  86.799 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     94.4823     3.8476  24.556  < 2e-16 ***
AgeC            -0.5409     0.2323  -2.329  0.02088 *  
Emergency        7.7539     4.4091   1.759  0.08020 .  
AgeC:Emergency   0.7612     0.2537   3.001  0.00304 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.96 on 196 degrees of freedom
Multiple R-squared:  0.07817,   Adjusted R-squared:  0.06406 
F-statistic:  5.54 on 3 and 196 DF,  p-value: 0.001137
  1. Conduct a likelihood ratio test to compare the models. Report the results. Which model is preferred?
logLik(m1)
'log Lik.' -937.5404 (df=4)
logLik(m2)
'log Lik.' -933.0483 (df=5)
-2*(logLik(m1) - logLik(m2))
'log Lik.' 8.984209 (df=4)
anova(m1, m2, test = "LRT")
Analysis of Variance Table

Model 1: Pulse ~ AgeC + Emergency
Model 2: Pulse ~ AgeC + Emergency + AgeC * Emergency
  Res.Df    RSS Df Sum of Sq Pr(>Chi)   
1    197 138115                         
2    196 132048  1      6067 0.002692 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Model 2 (with the AgeC*Emergency interaction) is a better model than Model 1 (without the interaction), \chi^2(1) = 8.98, p<.01.
    • Significant LRT = more complicated model is better
  1. Plot the simple slopes with the data points. What is the general pattern of results?
  • ggplot
ggplot(data = ICU, aes(x = AgeC, y = Pulse, group = Emergency)) +
  geom_point(aes(color = as.factor(Emergency))) +
  geom_smooth(method = "lm", 
              se = FALSE,
              aes(color = as.factor(Emergency)))
`geom_smooth()` using formula = 'y ~ x'

  • interactions
library(interactions)
interact_plot(m2, 
              pred = AgeC, 
              modx = Emergency, 
              plot.points = TRUE)

  • It looks like the emergency admitted group increases pulse rate with age while the non-emergency admitted group decreases pulse with age.
  1. Conduct simple slopes analysis and the Johnson-Neyman procedure. Report the findings, including test statistics, degrees of freedom, and p-values.
sim_slopes(m2, 
           pred = AgeC, 
           modx = Emergency, 
           johnson_neyman = FALSE)
SIMPLE SLOPES ANALYSIS

Slope of AgeC when Emergency = 0.00 (0): 

   Est.   S.E.   t val.      p
------- ------ -------- ------
  -0.54   0.23    -2.33   0.02

Slope of AgeC when Emergency = 1.00 (1): 

  Est.   S.E.   t val.      p
------ ------ -------- ------
  0.22   0.10     2.16   0.03
johnson_neyman(m2, 
               pred = Emergency, 
               modx = AgeC, 
               alpha = .05)
JOHNSON-NEYMAN INTERVAL

When AgeC is OUTSIDE the interval [-44.31, 1.06], the slope of Emergency is
p < .05.

Note: The range of observed values of AgeC is [-41.55, 34.45]

mean(ICU$Age, na.rm = TRUE)
[1] 57.545
  • Is the slope with respect to age significant for each group?
    • The slope with respect to age (AgeC) is significant and negative for non-emergency admissions (Emergency = 0); b = -0.54, t(196) = -2.33, p=.02.
    • The slope with respect to age (AgeC) is significant and positive for emergency admissions (Emergency = 1); b = 0.22, t(196) = 2.16, p=.03.
  • For which values of age are the emergency groups different in pulse rate?
    • “When AgeC is OUTSIDE the interval [-44.31, 1.06], the slope of Emergency is p < .05”
    • When AgeC is less than -44.31 (Age = 57.545 - 44.31 = 13.235)
    • When AgeC is greater than 1.06 (Age = 57.545 + 1.06 = 58.605)
  1. Conduct outlier analysis for the model. Are there observations with extreme values on the predictors or predicted values? Are there observations that change the findings? Briefly report your findings.
library(broom)
Warning: package 'broom' was built under R version 4.5.1
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
The following object is masked from 'package:purrr':

    some
m2_aug <- augment(m2)
m2_aug <- m2_aug %>% mutate(esresid = rstudent(m2))
head(m2_aug)
# A tibble: 6 × 10
  Pulse   AgeC Emergency .fitted .resid    .hat .sigma    .cooksd .std.resid
  <int>  <dbl>     <int>   <dbl>  <dbl>   <dbl>  <dbl>      <dbl>      <dbl>
1    96  29.5          1   109.  -12.7  0.0223    26.0 0.00140       -0.496 
2    88 -30.5          1    95.5  -7.51 0.0192    26.0 0.000417      -0.292 
3    80   1.45         1   103.  -22.6  0.00701   26.0 0.00134       -0.872 
4    70  19.5          0    84.0 -14.0  0.0329    26.0 0.00254       -0.547 
5    90  18.5          1   106.  -16.3  0.0134    26.0 0.00136       -0.632 
6   103  -3.55         1   101.    1.54 0.00683   26.0 0.00000613     0.0597
# ℹ 1 more variable: esresid <dbl>
ggplot(data = m2_aug,
       aes(x = c(1:nrow(m2_aug)), 
           y = .hat)) +
  geom_point() +
  geom_hline(yintercept = 3*(3+1)/200, 
             color = "red",
             linetype = "dashed") +
  geom_hline(yintercept = 2*(3+1)/200,
             color = "blue") +
  geom_text(aes(label=ifelse((.hat > 2*(3+1)/200), 
                             rownames(m2_aug), 
                             '')),
            hjust = 0, nudge_x = 2)

ggplot(data = m2_aug,
       aes(x = c(1:nrow(m2_aug)), 
           y = esresid)) +
  geom_point() +
  geom_hline(yintercept = 2, 
             color = "blue",
             linetype = "dashed") +
  geom_hline(yintercept = -2,
             color = "blue",
             linetype = "dashed") +
  geom_text(aes(label=ifelse((esresid > 2), 
                             rownames(m2_aug), 
                             '')),
            hjust = 0, nudge_x = 2) +
  geom_text(aes(label=ifelse((esresid < -2), 
                             rownames(m2_aug), 
                             '')),
            hjust = 0, nudge_x = 2)

ggplot(data = m2_aug,
       aes(x = c(1:nrow(m2_aug)), 
           y = .cooksd)) +
  geom_point() +
  geom_hline(yintercept = 1, 
             color = "red",
             linetype = "dashed") +
  geom_text(aes(label=ifelse((.cooksd > 1), 
                             rownames(m2_aug), 
                             '')),
            hjust = 0, nudge_x = 2)

  • See next question for write-up.
  1. Describe the overall findings for this model, including the analyses to probe the interaction. Be statistically accurate but avoid jargon and technical terms as much as you can. Be sure to use the names of the variables studied (i.e., pulse, age, emergency admission) rather than X and Y.
  • There were significant effects of AgeC, Emergency, and their interaction on Pulse. [Report these as before]
  • For non-emergency admissions, Pulse significantly decreased with Age.
  • For emergency admissions, Pulse significantly increased with Age.
  • Emergency and non-emergency admissions groups had significantly different expected Pulse rates for patients older than 58.605.
  • There were 8 observations with extreme leverage values and 7 observations with extreme externally standardized residuals, but no observations had concerning Cook’s D values (> 1), indicating that these extreme values were not changing the results.